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Abstract: Based on two independent samples Xi,...,Xm and Xm+i,---,Xn drawn from 
multivariate distributions with unknown Lebesgue densities p and q respectively, we propose 
an exact multiple test in order to identify simultaneously regions of significant deviations 
between p and q. The construction is built from randomized nearest-neighbor statistics. It 
does not require any preliminary information about the multivariate densities such as com- 
pact support, strict positivity or smoothness and shape properties. The properly adjusted 
multiple testing procedure is shown to be sharp-optimal for typical arrangements of the 
observation values which appear with probability close to one. The proof relies on a new 
coupling Bernstein type exponential inequality, reflecting the non-subgaussian tail behavior 
of a combinatorial process. For power investigation of the proposed method a reparametrized 
minimax set-up is introduced, reducing the composite hypothesis "p = g" to a simple one 
with the multivariate mixed density (m/n)p -|- (1 — m/n)q as infinite dimensional nuisance 
parameter. Within this framework, the test is shown to be spatially and sharply asymptot- 
ically adaptive with respect to uniform loss on isotropic Holder classes. The exact minimax 
risk asymptotics are obtained in terms of solutions of the optimal recovery. 

AMS 2000 subject classifications: 62G10, 62G20. 

Keywords and phrases: Combinatorial process, exponential concentration bound, cou- 
pling, decoupling inequality, exact multiple test, nearest-neighbors, optimal recovery, sharp 
asymptotic adaptivity. 



1. Introduction 

Given two independent multivariate iid samples 

Xi,...,Xm and X„i+i,...,X„ 

with corresponding Lebesgue densities p and q respectively, we are interested in identifying simul- 
taneously subregions of the densities support where p deviates significantly from q at prespecified 
but arbitrarily chosen level a G (0, 1). For this aim a multiple test of the composite hypothesis 
Hq : p — q versus Ha ■ p ^ q is proposed, built from a suitable combination of randomized 
nearest-neighbor statistics. The procedure does not require any preliminary information about 
the multivariate densities such as compact support, strict positivity or smoothness and shape 
properties, and it is valid for arbitrary finite sample sizes m and n — m. The hierarchical struc- 
ture of p-values for subsets of deviation between p and q provides insight into the local power of 
nearest-neighbor classifiers, based on the training set {Xi, X„}. Thus our method is of interest 
in particular if the classification error depends strongly on the value of the feature vector, related 
to recent literature on classification procedures by Belomestny and Spokoiny (2007). 
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There is an extensive amount on literature concerning two-sample problems. Most of it is de- 
voted to the one-dimensional case as there exists the simple but powerful "quantile transforma- 
tion" , allowing for distribution-freeness under the null hypothesis of several test statistics. Starting 
from the classical univariate mean shift problem (see e.g. Hajek and Sidak 1967), more flexible 
alternatives as stochastically larger or omnibus alternatives have been investigated for instance by 
Behnen, Neuhaus and Ruymgaart (1983), Neuhaus (1982, 1987), Fan (1996), Janic-Wroblewska 
and Ledwina (2000), and Ducharme and Ledwina (2003). Our approach is different in that it aims 
at spatially adaptive and simultaneous identification of local rather than global deviations. In the 
above cited literature asymptotic power is discussed against single directional alternatives tending 
to zero at a prespecified rate, typically formulated by means of the densities p and q corresponding 
to the transformed observations Xi ~ H{Xi), where H denotes the mixed distribution function 
with density h = {■m/n)p + (1 — ni/n)q. Note that the mapping H coincides with the inverse 
quantile transformation under the null. 

For power investigation of our procedure a specific two-sample minimax set-up is introduced. It is 
based on a reparametrization of (p, q) to a couple (0, h), reducing the composite hypothesis "p = g" 
to the simple one " (/) = 0" with the multivariate mixed density h as infinite dimensional nuisance 
parameter. The reparametrization conceptionally differs from the above described transformation 
for the univariate situation as it cannot rely on the inverse mixed distribution function. Nevertheless 
it leads under moderate additional assumptions in that case to the same notion of efficiency. In 
order to explore the power of our method, the alternative is assumed to be of the form 

{(p, q) : {m/n)p + (1 - m/n)q = h, ^ e T, UW > s} (1) 

for fixed but unknown h, some suitably chosen (semi-)norm ||.||, a constant S > and a given 
smoothness class J-'. For any a G (0, 1) the quality of a statistical level-a-test ^ is then quantified 
by its minimal power hiiE(^p g^ip, where the infimum is running over all couples (j>,q) which are 
contained in the set ([!]). It is a general problem that an optimal solution ip may depend on T and 
h. Since the smoothness and shape of a potential difference p — q are rarely known in practice, it is 
of interest to come up with a procedure which does not depend on these properties but is (almost) 
as good as if they were known, leading to the notion of minimax adaptive testing as introduced in 
Spokoiny (1996). Note that here we have however h as an additional infinite dimensional nuisance 
parameter. 

The problem of data-driven testing a simple hypothesis is further investigated for instance by 
Ingster (1987), Eubank and Hart (1992), Ledwina (1994), Ledwina and Kallenberg (1995), Fan 
(1996) and Diimbgen and Spokoiny (2001) among others, the two-sample context by Butucea and 
Tribouley (2006). The idea in common is to combine a family of test statistics corresponding to 
different values of the smoothing parameters, respectively; see, for instance, Rufibach and Walther 
(2008) for a general criterion of multiscale inference. The closest in spirit to ours is the procedure 
developed in Diimbgen and Spokoiny (2001) within the continuous time Gaussian white noise 
model and further explored by Diimbgen (2002), Diimbgen and Walther (2008) and Rohde (2008), 
all concerned with univariate problems. Walther (2010) treats the problem of spatial cluster analysis 
in two dimensions. 
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The paper is organized as follows. In the subsequent section, a multiple randomization test is 
introduced, built from a combination of suitably standardized nearest-neighbor statistics. Its cali- 
bration relies on a new coupling exponential bound and an appropriate extension of the multiscale 
empirical process theory. Asymptotic power investigations and adaptivity properties are studied in 
Section 3, where the reparametrized minimax set-up is introduced. It is shown that our procedure 
is sharply asymptotically adaptive with respect to sup-norm || • || on isotropic Holder classes i.e. 
minimax efficient over a broad range of Holder smoothness classes simultaneously. The application 
to local classification is discussed in Section 4. The one-dimensional situation is considered sepa- 
rately in Section 5 where an alternative approach based on local pooled order statistics is proposed. 
In that case the statistic does not depend on the observations explicitly but only on their order 
which in contrast to nearest-neighbor relations is invariant under the quantile transformation. Sec- 
tion 6.1 is concerned with a decoupling inequality and the coupling exponential bounds which are 
essential for our construction. Both results are of independent theoretical interest. All proofs and 
auxiliary results about empirical processes are deferred to Section 6.2 and Section 6.3. 



2. Combining randomized nearest-neighbor statistics 

The procedure below is mainly designed for dimension d > 2. The univariate case contains a few 
special features and is considered separately in Section [S] Let X := {Xi, Xn)' and denote by 
Xn the pooled set of observations. For any < fc < n — 1, the fc'th nearest-neighbor of X e 
with respect to the Euclidean distance is denoted by X ; we define X" := X. Note that the 
nearest-neighbors are unique a.s. The weighted labels are defined as follows 



A{X) := 



if X is contained in the first sample 
otherwise. 



In order to judge about some possible deviation of p from g on a given set B e B"^, a natural 
statistic to look at is a standardized version of P„(i?) — Qn{B) or more sophisticated. 



IB 

for some kernel ks supported by B, where P„ and denote the empirical measures corresponding 
to the first and second sample, respectively. Note that the statistic is not distribution-free, and in 
order to build up a multiple testing procedure several statistics corresponding to different sets B 
have to be combined in a certain way. 

2.1. Local nearest-neighbor statistics 

Let ij} : [0, oo) — > M denote any kernel of bounded total variation with maxj.g[o^oo) = V'(O) = 1 

and i){x) ~ for a; > 1. We introduce the local test statistics 



^{m/n){l-m/n) 1 ^ W^J - X]h \ , , 



{m/n){l — m/n) _ /" / \\Xj — x 



Ijkn 
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where 

^ n— 1 
1=0 

Every Tjkn is some in a certain sense standardized weighted average of the nearest-neighbor's labels 
and its absolute value should tend to be large whenever p is clearly larger or smaller than q within 
the random Euclidean ball with center Xj and radius H-'fj — -'^j^lh- 

2.2. Adjustment for multiple testing 

The idea is to build up a multiple test, combining all possible local statistics Tjkn- The typical way 
is to consider the distribution of the supremum sup^ Ijfcnj see, e.g. Heckmann and Gijbels (2004). 
The problem is that the distribution is driven by small scales with a corresponding loss of power 
at larger scales, as there are many more small scales which contribute to the supremum. Here, we 
aim at a supremum type test statistic 

T„ := sup sup \\Tjkn\-Cjkn\, 
0<k<n-l l<j<n 

where the constants Cjkn are appropriately chosen correction terms (independent of the label 
vector A) for adjustment of multiple testing within every "scale" k of fc-nearest-neighbor statistics. 
These correction terms in the calibration aim to treat all the scales roughly equally. Although 
the distribution of T„ under the null hypothesis depends on the unknown underlying distribution 
p — q, the conditional distribution jCo{Tn\Xn) of the above statistic is invariant under permutation 
of the components of the label vector A. Here and subsequently, the index "0" indicates the null 
hypothesis, i.e. any couple (p, q) with p = q. Precisely, let the random variable H be uniformly 
distributed on the symmetric group Sn of order n, independent of X. Then Co(Tn | Xn) = C,{Tn o 
n I Xn) , where (T„ o H) (A) := T„ (Ani , An„) . Elementary calculation entails that 

= and Var ^Tjfcn o n I A'„ j = 1. 

Thus the null hypothesis is satisfied if, and only if, the hypothesis of permutation invariance (or 
complete randomness) conditional on Xn is satisfied. 

An adequate calibration of the randomized nearest-neighbor statistics, i.e. the choice of smallest 
possible constants Cjkn, requires both, an exact understanding of their tail behavior and their 
dependency structure. Note that the randomized nearest-neighbor statistics have a geometrically 
involved dependency structure. Even in case of the rectangular kernel ip it depends explicitly on 
the "random design" Xn which incomplicates the sharp-optimal calibration for multiple testing 
compared to univariate problems, where the dependency of the single test statistics remains typ- 
ically invariant under monotone transformation of the design points. Also, the optimal correction 
originally designed for Gaussian tails in Diimbgen and Spokoiny (2001) does not carry over as only 
the subsequent Bernstein type exponential tail bound is available. 

A coupling exponential inequality Based on an explicit coupling, the following proposition 
extends and tightens the exponential bounds derived in Serfling (1974) for a combinatorial process 
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in the present framework. If not stated otherwise, the random variable 11 is uniformly distributed 
on Sn, independent of X- 

Proposition 1. Let Tjkn be as introduced above and define 

-1 



Then 



where 



S{m,n) := |Emin^ — , j with S* ~ Bin(n, m/r 

^V2 



Tjkn on > 5{m,n)r] 



R^{m,n) 



Xn] < 2 cxp - 



2||V'||sup max(m, n — m) 
3 -^Z in{n — m) 



Remark The expression d{m,n) is the payment for decoupling which appears by replacing the 
tail probability of a hypergeometric ensemble by that of the Binomial analogon. For details we refer 
to Section l6.ll In the typical case < liminf„(TO/n) < limsup„(m/n) < 1 we obtain 5{m,n) — 
1 + 0(n^^/^). Compared to results obtained for weighted averages of standardized, independent 
BernouUis, the above Bernstein type appears to be nearly optimal, i.e. subgaussian tail behavior 
(with leading constant 1/2) is actually not present. 

Via inversion of the above exponential inequality, additive correction terms Cjkn for adjustment 
of multiple testing are constructed. The next Theorem motivates our approach. The construction 
is designed for typical arrangements of the observation values which appear with probability close 
to one. To avoid technical expenditure, we restrict our attention to compactly supported densities. 
dw denotes the dual bounded Lipschitz metric (see, e.g. van der Vaart and Wellner 1996) which 
generates the topology of weak convergence. "— s-p^^" refers to convergence in probability along the 
sequence of distributions (P„). 

Theorem 2. Define the test statistic 

Tn '■— sup < \Tjkn\ — Cjkn \ 
l<j<n ^ J 
0<fe<n-l 

with 

Cjkn ■= iRnl~kn^{m,n)Vjkn + S {m, n) y^2T jkn , 

where Rn ^ n^^^^R^{m,n) andTjkn log (l/7jfen^) • Assume that the sequence of mixed densities 
hn '■— {m/n)pn + (1 — m/n)qn on [0, 1]'' is equicontinuous and uniformly hounded away from zero, 
while < liminf„ m/ri < limsup„ m/n < 1. Then the sequence £(T„ o 11 | <¥„) of conditional 
distributions is tight in (P^™ (g) Q^^" -probability. Additionally, 

dw(c{Tn o n I Xn), C{Tm. 
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where 



sup 

te[o,i]'', 

0<r< max — f||2 
3;G[o,l]rf 



/[04]d (/>rt,n(a;) dW{x) 

lrt,n 



.y/21og(l/7rt, 



1 /2 

mt/i a standard Brownian sheet in [0, 1]'', 7rt,n ( /[q j^jd <t'rt,n{x)'^dx) and 



4>rt,n{x) 



t\\. 



[0,1]'' ^ 



t\\ 



hn{z)dz 



\/hn{x) 



The extra-term 3 i?„7jj,^(5(m, n)rjfe„ in the constant Cjkn results from the exponential inequality 
in Proposition [T] and can be viewed as an additional penalty for non-subgaussianity. The theorem 
entails in particular that the sequence C{Tn o 11 |A'„) is weakly approximated in probability by a 
tight sequence of non-degenerate distributions £(Te„) which indicates that our corrections Cjkn 
are appropriately defined and cannot be chosen essentially smaller. Note that the approximation 
>C(re,J depends on the unknown mixed distribution even under the null hypothesis. 



2.3. The multiple rerandomization test 

Let Kq(X) ■— argmin^^Q {P(r„ oU < C\ Xn) > 1 ^ «} denote the generalized (1 — a)-quantile of 
CyTn o n I Xn). Then we propose the conditional test 



ifr„<«;„(X) 

1 ifT„>K„(X). 



Our method can be viewed as a multiple testing procedure. For a given set of observations 
{Xi, Xn}, the corresponding test statistic exceeds the (1 — a)-quantile if, and only if, the random 
set 

:= {Bx^{\\X^-Xj\\2)\ l<j<n,0<k<n-l; r,fe„(X) > Qfe„(X) + k„(X)} 

is nonempty, where Bt{r) denotes the Euclidean ball in R'' with center t and radius r. Since the 
test is valid conditional on the set of observations, we may conclude that p deviates from q at 
significance level a on every Euclidean ball Bt{r) G Va. In order to reduce the computational 
expenditure and to increase sensitivity on smaller scales, one may restrict one's attention to pairs 
(j, k) ior k < m for some integer m € {1, n — 1}. Note the validity of the test does not require 
any assumption about the densities - even not Lebesgue continuity. 

Recently, Walther (2010) proposes a multiple test for cluster analysis in two dimensions based 
on a suitable combination of local log-likelihood ratio statistics, evaluated on a fixed choice of 
axis-parallel rectangles. These statistics are not linear in P„ and Qn, respectively, but result in a 
subgaussian tail-behavior. 
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3. Asymptotic power 

3.1. Minimax- efficiency and spatial adaptivity - local alternatives I 

In this section we show that the above introduced multiple testing procedure possesses optimality 
properties in a certain minimax sense. Nonparametric comparison of different samples was recently 
investigated in the minimax approach by Butucea and Tribouley (2006), in a rate-adaptive way 
and of a different sense from our results here. We focus mainly on the considerably more involved 
problem of efficient adaptivity. Let us first introduce some notation. For any set J C [0, 1]*^ and 
function / from [0, 1]'' M, ||/|| j sup^^j |/(x)|. For any convex / C M'* let TCdiP, ^1 ^) denote 
the isotropic Holder smoothness class, which for (3 <1 equals 

Hdif3,L;I) := : / ^ M : \ ^{x) ~ (l){y) \ < L\\x - ijlf^}. 

Let 1(3 \ denote the largest integer strictly smaller than (3. For (3 > 1, 'Hd{f3, L; I) consists of all 
functions f : I ^ R that are [/3J times continuously differentiable such that the following property 
is satisfied: if Py'^'' denotes the Taylor polynomial of / at the point j/ e / up to the [/3J'th order, 

< L\\x — y\\2 for all x, ?/ G /. 

In particular the definition entails that / e HdiP, L;R'^) implies f oU G TLd{l3 , L]W^) for every 
orthonormal transformation U G K'^^''. For any pair of densities (p, (?) on [0,1]'', let h{m,n,p,q) 
denote the corresponding mixed density {m/n)p+ (1 — m/n)q. Fix a continuous density ft, > and 
define !F\^'^\(3, L) to be the set of pairs of densities such that 

(j){m,n,p,q) := f ^ = G L; [0, 1]'') and h{m,n,p, q) ^ h. 

y'h{m,n,p, q) 

Reparametrizing the composite hypothesis With the notation above, 

p = h ■ ^1 + (1 — m/n) (/)/ \/h^ and q = h ■ ^1 — (m/n) (l)/\/h^ . 

Consequently "p — g" is equivalent to "0 = 0", and if [ni/n)p + (1 — m/n)q = h is kept fixed, 
the composite hypothesis "p — q" reduces to the simple hypothesis = 0". In order to develop 
a meaningful notion of minimax-efficiency for the two-sample problem we treat subsequently the 
mixed density h = h{m,n,p,q) as fixed but unknown infinite dimensional nuisance parameter for 
testing the hypothesis 

Hq : (f) — versus Ha : 7^ 0. 

Note that in case that h is uniformly bounded away from zero and p is close to g, (j) coincides 
approximately with the difference 2(y^— ^/q), see also the explanation subsequent to Theorem [3l 

Remark It is worth being noticed that the optimal statistic for testing Hq against any fixed 
alternative (j) equals the likelihood ratio statistic 

^ fi (1 + (1 - „./..) A,x,) n (1 - ("^/")^(^.)). 

\ ■ ^ ' J I— I V j—7n-\-l ^ 



f[x)^P^yi\x) 
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whose distribution still depends on h under the null. Here and subsequently, the subscript (m, n, p, q) 
indicates the distribution with density YiiLi P Y[i=m+i 1- "^^^ rational behind the reparametrization 
is to eliminate the dependency on the nuisance parameter h in the expectation under the null of the 
first and second order term of the log-likelihood expansion, resulting in asymptotic independence 
of h for its distribution under the hypothesis for any local sequence 

The subsequent theorem is about the lower bound of hypothesis testing within the above defined 
classes of densities. 

Theorem 3 (Minimax lower bound). Let 

Pm,n ■■= I —7 T and define c{fj, L) 

where 7^ defines the solution to the optimal recovery problem 0j below. Assume that the sequence 
of mixed densities on [0,1]'' is equicontinuous and uniformly hounded away from zero. Then 
for any fixed (5 > and every nondegenerate rectangle J C [0, 1]'', 

limSUp inf ^{m,n,p,q) i'n < Ct 

(p,9)6^<:-'(/3,L): 

mJ>(l-S)c(l3,L)prr,,„ 

for arbitrary tests ipn at significance level < a. 

Note that -tpn may depend on (f3, L) and even on the nuisance parameter /i„ as already does the 
Neyman-Pearson test for testing Hq against any one-point alternative. 

We now turn to the investigation of the test introduced in Section [2l To motivate the choice 
of an optimal kernel for our test statistics and its relation to the optimal recovery problem, let us 
restrict our consideration to the Gaussian white noise context, leading in case of univariate Holder 
continuous densities on [0, 1] with [3 > 1/2 to locally asymptotically equivalent experiments 



dXin{t) = pn{t) dt + dWi{t) and dX2n{t) = <z„(t) dt + /^"^^^ ^ dW2{t) 

\Jm y/(^n - to) 

for two independent Brownian motions Wi and W2 on the unit interval (Nussbaum 1996, Theorem 
2.7 with /o = hn and Remark 2.8). A multiscale statistic built from standardized differences of 
kernel estimates 



ij{t){dXi„{t) - dX2n{t) 



\J {m/n){l — m/n) 

UVKh 

(which is actually not admissible since hn is unknown in general) then yields a distribution under 
the null close to ours in Theorem[2l up to the fact that our local integrals in dimension one are taken 
with respect to a Brownian bridge, reformulated to a Wiener process integrand by change of the 
kernel. Concerning the optimization of '0, the quantity to be maximized within this Gaussian white 
noise context appears to be the expectation of the single test statistics under the least favorable 
alternatives as their variances do not depend on the mean. In case h^i — 1 this expression eqiicils 

.^^ J^it)i;it)dt 

0eWi(/3,L;[O,l]): llV'lb 

W\.J>S 
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leading to the dual representation of the optimal recovery problem (see Donoho 1994a). 

The optimal recovery problem in higher dimension In the framework of isotropic Holder 
balls, the optimal recovery problem leads to the solution 7 = 7^ of the optimization problem 

Minimize II7II2 over aU 7 e 1;K'') with 7(0) > 1. (2) 

The closedness of Tid{(3, n {7 : M'* ^ M | 7(0) > l} in L2 entails that the solution exists, its 
convexity implies furthermore uniqueness whence by isotropy of the functional class 7i(j(/3, 1; K'') it 
must be radially symmetric. In case /3 < 1, one easily verifies that 7^3 (x) = V'/sllls^lb) = (l~ I|2:|l2) + - 
In its generality, the optimal recovery problem in higher dimension has not yet been investigated. 
Considering the partial derivatives of 7/3 along the coordinate axes entails that ipp is necessarily 
contained in 7Yi(/3,L;R). However, the transfered optimization problem 

minimize J ip{rf\rf~^dr over aU -0 with V'dl-lb) e Ti-diP, 1;*) and V'(O) > 1 (3) 

does not coincide with the univariate optimal recovery problem due to the additional weighting by 
|r|'^~i which comes into play by polar coordinate transformation. Whether the solution of ([3]) for 
/? > 1 is compactly supported or not is still open. For the case of univariate densities, it is known 
that the solution of the optimal recovery problem has compact support for any /3 > (Leonov 
1997), but an explicit solution in case /3 > 1 is known for /3 = 2 only. Concerning details and advice 
on its construction, see Donoho (1994b) and Leonov (1999). For dimension d > 1, see Klemela and 
Tsybakov (2001). 

The next Theorem is about the asymptotic power of the multiple test developed in Section [21 We 
restrict our attention to compact rectangles of (0, l)'^ to avoid boundary effects. This restriction 
may be relaxed by the use of suitable boundary kernels, extending those of Lepski and Tsybakov 
(2000) for the univariate regression case to higher dimension. 

Theorem 4 (Adaptivity and minimax efficiency). Let 0* ^ denote the multiple rerandomization 
test at significance level a, based on the kernel tpi3l{' ^ 0} reseated to [0,1]. In case of unbounded 
support of il)fj, we may use a truncated solution '4'f3,K — V'/3^{0 ^ ' ^ K}- Let < liminf„ m/n < 
limsup„m/n < 1. Assume that (hn) is equicontinuous and uniformly bounded away from zero. 
Then for any fixed 5 > Q, there exists a K > such that 

liminf inf IP(m,ri,p,<j) (C a = l) = 1 

(p,?)6^<"'"'(/3,L): 
ll0ll./>(l+'5)c(/3,L)p„.,„ 

for any nondegenerate compact rectangle J C (0, 1)*^. 

In particular, the test is sharp-optimal adaptive with respect to the second Holder parameter 
L. While in view of the results in Ingster (1987) the optimal rate of testing may be expected, some 
technical effort had to be done to propose a calibration achieving even sharp minimax-optimality. 
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Remark It is worth being noticed that the procedure achieves the upper bound uniformly over 
a large class of possible mixed densities. The intrinsic reason is that conditioning on Xn is actually 
equivalent to conditioning on H„ , which indeed is a sufficient and complete statistic for the nuisance 
functional H„. 

Remark (Sharp adaptivity with respect to (3 and L) Our construction, including the procedure 
especially designed for the one-dimensional situation, involves one kernel, shifted and rescaled 
depending on location and volume of the nearest-neighbor cluster under consideration. Due to the 
dependency of the optimal recovery solution 7^3 on /3, the corresponding test statistic T„ = T„(/3) 
achieves sharp adaptivity with respect to the second Holder parameter L only. Taking in addition 
the supremum T* :~ sup^gj^^ r„(/3) over all kernels 7^ within a compact range [/3o,/?i] C 
(0,cxd), sharp adaptivity with respect to both Holder parameters may be attained, provided that 
the above supremum statistic still defines a tight sequence (in probability), i.e. the corresponding 
sequence of 1 — a-quantiles ^^(X) was stochastically bounded. Then the convergence 

P(™„,,„)(t,:><(X)) > P{rn,n,p^,,.){Tjj^JP) - OjS^JP) ><(X)) 1 as n -> oo 

for any random couple (j„, k^) and any choice of P could be be extracted from the proof of Theorem 
31 At least for /3 G [(3o, 1] this tightness may be deduced from the fact that the unimodal and 
symmetric depends continuously on f3 in the sup-norm - in particular £(( J (j)l!^\x)dW (x)^ 
as defined in Theorem 2 with = ipp depends continuously on /3 in the weak topology. A general 
investigation especially for (3 > 1 is beyond the scope of this article. 

The next theorem shows however that our procedure simply based on the rectangular kernel 
is rate-adaptive with respect to both Holder parameters {(3,L). Due to the fact that it combines 
locally all nearest-neighbor scales at the same time, it even adapts to inhomogeneous smoothness 
of p — 5, i.e. achieves spatial adaptivity. 

Theorem 5 (Spatial rate-optimality). Let (j^n a denote the multiple rerandomization test based 
on the rectangular kernel. Assume that < liminf„TO/n < limsup„ m/n < 1. Then for any 
fixed fc G N and parameters {Pi, (3k, Li, Lk), K > and any collection of disjoint compact 
rectangles Ji C [0, 1]'', * = 1, k, there exist constants di — d{Pi, Li, K) with 

liminf inf P(,„,„.p,,) f J, n I?„(A'„) ^ Vi = 1, fc ) = 1. 

(p-«)|,7,eH<j(ft,L,;J,) 
\\p-q\\j,>d^ p„,„(ft), 

3.2. The stylized type of locally constant alternatives on small and large scales - 
local alternatives II 

The results from the previous paragraph deal with small scales of different (arbitrary) order de- 
pending on the smoothness classes under consideration. In particular, the minimax lower bound 
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is concerned with scales tending to zero as m,n — > oo, and it is not yet clear that there is no 
substantial loss at rather large scales. The size of possible deviation ||<?f>||sup and the scale (here 

~ (ll'/'nllsup/i)"'^^'') are linked in a specific way depending on the smoothness class under consid- 
eration, because the smoothness assumptions do not allow for arbitrarily fast decay to zero. The 
next theorem is different in spirit. We do not focus on smoothness classes but on stylized situations 
with (j) being lower bounded by a "plateau" of absolute value c„/ \/n6^ within a ball Bx{Sn)- With 
A denoting the Lebesgue measure on B{[0, 1]'^), define 



d 



J'^'"\c,x,d) := < p, g A-densities on [0, 1]'' : (j){m,n,p, q){z) > —= B^{5), < c < Vn(5' 

jl™'"^(c,a;,^) := i p, g A-densities on [0,1]** : (l){m,n,p,q){z) < \/ z e B^{d), < c< VnM 

and 

a('"'")(c,x,5) := jj"'")(c,x,(5)U j1"^'")(c,x,5). 
Theorem 6. Assume that < liminf„ m/n < limsup„ m/n < 1. 

(i) Iftpn is any sequence of tests at significance level a € (0, 1), then 

implies that nS^ — > oo and c„ — > oo . 

(ii) Ifipna decribes the multiple rerandomization test based on the rectangular kernel at signifi- 
cance level a e (0, 1), 

, inf E(„ ) tp* 1, 

(P,«)6e<'"'">(c,.,a;,5n) 
h{m,n,p,q)>K>0 

provided that n6^ — > oo and -\/log(l/5„)/c„ — > 0. 

In particular our test is also consitent against local alternatives of the type «„</>/ \/n for k;„ — > oo. 
Comparing (i) and (ii) demonstrates that the adaptive search for the location of deviations costs 
an additional logarithm of its inverse scale. One may read out of the proof that the restriction for 
the sequence (c„) in (ii) can be slightly refined. 

4. Application to classification 

Suppose we are given an iid sample {Xi,Yi), i = 1, ...,n, where the marginal distribution of Xi is 
assumed to be Lebesgue-continuous with density h on R'^, and Yi takes values in {0, 1} with 



Xi = xj = p{x). 

Then M := X]"=i ^ Bin(n, A) with X := J p{x)h{x)dx. Assuming A G (0,1) to be known, 
the question of local classification is to identify simultaneously subregions in where p deviates 
significantly from A which results in local testing the hypothesis 

Hq : p = X versus : p ^ X. 
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Imitating our procedure introduced in Section [21 we may combine suitably standardized local 
weighted averages of labels, but the standardization differs due to the fact that the sum of (strictly) 
positive labels is random and not fixed, in particular Yi,...,l^ are stochastically independent. 
Consequently, we may then rely the procedure on the classical Bernstein exponential inequality 
for weighted averages of standardized BernouUis. Of course, the optimal separation constant for 
testing " p — A" within some Euclidean ball Bt{r) and its complement depends on the amount of 
observations in Bt (?') , whence analogously to the consideration above for the two-sample problem 
we may use the reparametrization of (p, h) to (0, h) with 

Vh. 



A(l-A) 



The power optimality results carry over to the classification context with similar arguments as 
used in the proof of Theorem [4l We omit its explicit formulation at this point. 



5. Distribution-freeness via quantile transformation the case d=l 

The one-dimensional situation allows for an alternative and more elegant approach based on order 
relations. For let X(i), denote the order statistic built from the pooled sample and define 

for any < j < k < n the local test statistics 

where 

Compared to the procedure described in the previous section, we omit the explicit dependence 
of the weights on the observed values. Note that in contrast to nearest-neighbor relations, the 
order remains invariant under quantile transformation, i.e. rank(i?„(Xi)) — rank(Xj), resulting 
in distribution-freeness of the corresponding multiscale statistic under the null. Suppose the null 
hypothesis is satisfied for some Lebesgue continuous distribution on the real line. Then conditional 
on the order statistics as well as unconditional, the label vector is uniformly distributed on the set 

<^ A e {n/m, -nl{n - m)}" : ^ A,"^ = f ' 

The described test statistics are local versions of classical Wilcoxon rank sum statistics. We omit 
any further investigation as the calibration for multiple testing can be done analogously to that 
proved in Theorem [2] - but keep in mind that the approximating Gaussian multiscale statistic 
under the null hypothesis will be independent of the nuisance functional H„ due to the quantile 
transformation. Note that the use of typical mathematical tools for power investigation of rank 
statistics like Hoeffding's decomposition is getting involved because the kernel ^/i^ for /3 < 1 is not 
differentiable. 
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6. Proofs and further probabilistic results 

6.1. Decoupling inequality and coupling exponential bounds 

This section contains the coupling exponential bounds, i.e. in this context for weighted averages 
from a hypergeometric ensemble. Using a different technique, namely an explicit coupling construc- 
tion, the subsequent proposition extends results of HoefFding (1963) on decoupling of expectations 
of convex functions in the arithmetic mean of a sample without replacement. Whereas in the latter 
case decoupling with constant 1 is actually correct, a simple counterexample for an ensemble of 
two elements already shows that the result does not extend to arbitrary weighted averages, and 
some payment for decoupling appears to be necessary. 

Proposition 7 (Decoupling inequality). Let Zi, Z2, Zn be iid with 

W>(Zi = 1) = — and V(Zi = O) = 1 , < m < n. 

^ ' n n 

Let a e M" with X;r=i ai = and ^ :R ^ R be convex. Then 



E 




with 




In particular, 5{m, n) ^ = 1 + 0{n ^/^) for m/n X G (0, 1). 



Proof Let X be uniformly distributed on the set 



n 




and let S Bin(n, m/n) such that X and 5 are independent. Define 



M := {i: Xi = 1}. 



Conditional on X and S, the random vector Z G {0, 1}" is constructed as follows: 



If S > m, let Zi = 1 for alH e M and let (Zj)jgM<= be uniformly distributed on the set 




For S <m, let Zi = for all i G M'^ and let {Zi)i^M be uniformly distributed on 



{ze{0,l}'':J2zi = s}. 
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Note that Zi, Z„ are iid Bin(l, m/n). Then 



E 



n , n 



X,S 



Furthermore, 



> 



1=1 



E * f CiZi X, S*] j (Jensen inequahty) 



*( /{5 < m}— ^ fli + /{5 > ^ a 

n — S" 



5- 



El-f /{S" < m}— ^ a 



= E*fminf-,^^)ya,X 
V \m n ~ mJ ^-^ 



I{S > m}- 



n ~ m 



(since ai — O) 



4=1 



EE 



> E* 



*( minf — -)y^aiX, 

V \m n ~ m/ ^ — ' 



X 



. f S n — S 
mm I — . 

V m ' n — 771 



n \ 

I a^Xi j (Jensen inequahty). 



E min 



. (S 7l-S\ 

ml — , 1 

Vm n — TO/ 



= 1-E 

> 1-E 

> 1 



(5'-to)_ ^ {S-m)+ 

777 n — TO 

jS* — to| 



min(TO, 71 — m) 
A(m, 7i) 



with A(to, n) := to(?i — to) / min(TO, 7i — to), which is uniformly bounded for m/n — > A G (0, 1). 
□ 

Using the decouphng above, the next proposition presents the exponential bounds for the com- 
binatorial process which are essential for our construction. It implies Proposition [1] and improves 
in particular exponential tail bounds for the hypergeometric distribution of Serfling (1974) in the 
coefficient in front of 77^ for m/71 close to zero or one, moderate 77 and large 71. Note that this 
coefficient is crucial for the efficiency of the testing procedure. The results may also be compared 
with the decoupling based exponential tail bounds in de la Pefia (1994, 1999). 

Proposition 8 (Coupling exponential inequalities). Let Zi, ...,Z„ he iid with 
F(Zi = 1) = — aTid ¥{Z, =0) ^ 

^ ' 71 ' 

Let ipi, ...,'!/'n rea/ valued nuTnhers with tp its arithmetic mean and denote 

TO(n — to) 



1 , < TO, < n. 

n 



Var^7/>,Z, 



n(n — 1) ^-^ 

^ ' i=i 
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Then in case of "frn.n 7^ 0, 



> 5(m, n)rj 



Zi = 771 j < 2 cxp y-- 
< 2 exp ( - 



7^y2 



+ rj R{7p,77i,n) 
377 9 



max, IV", - V'l (m m\ 
H['ip, 771,71) := max I — , 1 anc 

3 7m.n V n ?^ / 

maxi \7pi — ip\ /7n 



2c(m, n) 2c(TO,n)2 



max(m, n — to) 

^771(71 — TOJ 



Proof With 



M 



■ max 



n / 



we obtain for any t > Q 



(1 " TO 
Y^AZ,--) > S{77l,7l)7 
7m,n ~l ^ 



1 " TO 

E(^*~'^)(^* ) > S{m,n)7j 



TO 



(-^^)E{e-P 



t 8(771, 7l) 



2— 1 ^ i— 1 



< exp 

\ IVl J 

< exp ( - i ^) Eexp(^^^^-^ ^i^^ " ^) " ^)) (Proposition [T]) 

1 



< 



M2 



(4) 



whereby the last inequahty follows from the fact that for any random variable Y with \Y\ < 1, 
EY = and Var(y) = a^, 

Eexp(iy) < 1 + cr2(e* - 1 - i) < cxp (^a^ (e* - 1 - t)) . 

Elementary algebra shows that ([4]) is minimized with the choice t := log (l + 77M), which yields 
first a Bennett (1962) exponential bound and because of (1 + x) log(l + x) ~ x > (x'^ /2)/{l + x/3) 
consequently the Bernstein type 



"( y^tpi(Zi ) > 5 [771,71)7] y^Zi = m| < exp(- — 



7]^/2 



7]M/3 



A symmetry argument provides the same bound for Tpi replaced by —ipi, which completes the proof 
of the first inequality. Using that 7m,n > \/ {m/7i){\ — m/n) max^ — -01, we obtain the second 
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asserted inequality from 



l + TjAI/S - l + r]c{m,n)/3 



2c(m, n)/3 2c{rn, n) / 3{1 + rj c{m,n)/ 3) 
> _^ 1 

- 2c(m,n)/3 2c(m,n)2/9' ° 
6.2. Auxiliary results about empirical processes 

This section collects results in the context of empirical processes which are essential for the next 
section. For any totally-bounded pseudo- metric space (T, p) , we define the covering number 

N{e, r, p) := min (ttTo : TqCT, inf p{t, to) < e for aU t e t). 

Let B{T) denote the Borel-a-field on T induced by the pseudo-metric p (which induces a topology 
in the usual sense, although without the Hausdorff-property if it is not a metric) and let C [0, 1]"^ 
be a family of measurable functions. For any probability measure P on B^T), consider the pseudo- 
distance dp(f,g) := J \ f — g\ dP for f ,g ^ T . Then for any m > 0, the uniform covering numbers of 
T are defined as M{u, J-') :— supp N{u, dp), where the supremum is running over all probability 
measures P on B{T). 

Theorem 9. (Diimbgen and Walther (2008, technical report)) Let Z = (Z(t))fgr be a stochastic 
process on a totally bounded pseudo-metric space (T, p) . Let K be some positive constant, and for 
6 > let G{-, S) a nondecreasing function on [0, oo) such that for all r] > and s,t € T, 



\Zis)-Zit)\ ^ 



G(?7,<5)} < i^exp(-r,) if pis,t) > S. (5) 



I p{s,t) 
Then for arbitrary S > and a > 1, 

^ KS 
~ "2^ 



P||2'(s) - Z{t)\ > l2J{p{s,t),a) for some s,t£% with p{s,t) < 
where % is a dense subset ofT, and 

J{e,a) := ( G{\og{aD{u)'^ /u),u) du, 
Jo 

D{u) — D{u,T,p) niax|^7^ ■ %> C T,p{s,t) > u for different s,t Cz 

Remark. Suppose that G{ri,6) — qr/'^ for some constants q,q > 0. In addition let D{u) < 
Au~^ for < M < 1 with constants A> 1 and B > 0. Then elementary calculations show that for 
< e < 1 and a > 1, J(e, a) < Ce log(e/e)« with C ^q max(l + 2B,\og{aA^))'' \og{e/z)idz. 



For the proof of Theorem [2] the subsequent extension of the Chaining Lemma VII. 9 in Pollard 
(1984) and Theorem 8 in the technical report to Diimbgen and Walther (2008) will be used. It 
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complements in particular the existing multiscale theory by a uniform tightness result and to a 
situation where only a sufficiently sharp uniform stochastic bound on local covering numbers is 
available, which typically involves additional logarithmic terms. The situation arises for example in 
the multivariate random design case where a non-stochastic bound obtained via uniform covering 
numbers and VC-theory may be too rough. 

Theorem 10 (Chaining). Let {Yn)neN be a sequence of random variables such that Yn takes 
values in some polish space For any yn G J^n; l^t {Zn{t',yn))tGTy^ be a stochastic process on 
some countable, metric space {Ty^, pn{-, -lyn)) , where Pni-t-Wn) < 1- Suppose that the following 
conditions are satisfied: 

(i) There are measurable functions c7„(.;l^) : 7V„ (0,1] and Gn{-,S) : [0,00) [0,00) such 
that for arbitrary s,t G Ty^ , > and 5 > Q, 

p(|Z„(t,y„)| >a„(t;r„)G„(r,,(5)|r„) < 2exp(-r7) z/ a„(i; y„) > 5, 

\crn{t;Yn) - an{s;Yn)\ . f ^ 

sup —T < G < 00 for some constant C > 0, 

s,tery-„ pn[s,t;Yn} 

G { (5) 

{t e Ty^ ■ (Jn{t; Yn) > 6\ is compact, and Go '■= sup sup — — < 00 . 

neN77>0,0<5<l 1 + ?? 

(ii) There exists a sequence {Cn)neN of measurable sets and positive constants A,B,W,a such 
that 

N(u6,{teTy^:an{t;Yn)<S},pn{.,.;Yn)) < Au-^r^log(e/(w5))" foru, 6 €{0,1] 
whenever Yn GCn- 

For constants q,Q > define 

A (A nv^ / \Znis;Yn) - Znit;Yn)\ ^ „ 

An{5,q,Q;Yn) := <^ sup , ^ — 1—/ — / ^- v \\a - Q 

I s,teTY„:p„{s,t;Y„)<S Pn[S,t;Yn)log{e/pn{s,t;Yn))'i 

Then there exists a constant C — C{Go, A, B, W, a, q,Q) > such that for < S < 1 

J \Zn{t;Yn)\ 



< G„ W^log(l/a„(t;y„)) + Gloglog (e/c7„(i; F„)) , c7„(t;F„) 

C^ra yt'i -In ) \ 

+ C \og{e I CTn{t;Yn))-^ on {t : an{t;Yn) < 5] 



Yn 



is at least P^^„(2(5, g, Q; Yn) Yn^ — C\og{e/5) ^ whenever Yn G C„. 

If in particular W^" {Cn) — > 1 and\\m.s\fi\nin^{An{5,q,Q-,Yn) Yn^ = 1 a.s., then the sequence 



C[ sup J \Zn{t^yn)\ _ Gn(w\og{l/an{t;Yn)) + G log log (e/a„(t; r„)) , a„ (t; F^) 
\terr„ [ crn\t\ in) \ 

is tight in {F^" ) -probability, provided that inf^sup^gT-^ cr„(t;y^) > a.s. 



Yn 



A. Rohde/ Optimal Caliration for Multiple Testing 



18 



Remark Note that in case of G(r/, 5) = (k?7)1/'" with k > 1, 

G(w\og{l/5) + Cloglog(e/,5),(5) +C\og{e/6)-^ 
= {KW\og{l/5)f''' + o{\og\og{e/5)\og{e5f'''-^ 
= {kW log(l/5))i/'^ + o(l) as (5 \ 0. 

Proof Due to the factorization lemma, the conditional probability and expectation factorize 
under the above conditions, i.e. we may consider a sequence (yn)neN and work with the se- 
quence of conditional laws L{Zn(.,Yn)\Y.a — y„), but note that we do not assume equality of 
C{Zn{-]Yn)\Yn = ijn) and C{Zn{.;yn)) in general. The first part of the proof is a modification of 
the Chaining in Diimbgen and Walther (2008, technical report) applied to the conditional distri- 
bution C{Zn{., Yn)\Yn = i/n) for ?/„ G C„. Here we need however to define their additive correction 
function Hi (= iJ„(.;y„) subsequently due to the dependence on n and Y„ in our setting) in a 
different way, taking into account the additional logarithmic terms in the bound of the covering 
numbers. Lining up with their arguments, a suitable choice for the correction function _ff„ appears 
to be 

Gn\w\og(—^ -) + {B + a)\ogu{aJt;yn)) + (2 + a) log log — -), (T„(i;2/„)i 

= GJw\og( j )+ ((i3 + a)7+(2 + a))loglog( ^ ), a„(t;y„)l. 

This term is essential for our proof of efficiency. It is important that the constant a does not 
influence the leading term. Concerning the tightness in probability as stated in the second part 
of Theorem llOi notice that it does not follow by an immediate continuity argument because the 
metric (and the metric space) change with both, y„ and n, hence some additional uniformity is 
required. For < 5 < (5' < 1 let C/„((5, 6';Yn) be defined by 



sup 



^it;Y„)e(S,S'] 1 <^n{t; Yn) 

teT„ 



v'l ' " ^" ( Vgn(^; Yn))+C log log (e/a„(i;y„)), <Jn{t;Yn) 



First observe that for any fixed K > 0, 

p(f/„(0, 1; r„) > if I r„) < p(c/„(0, S; y„) > K/2 I K„) + v(UniS, 1; F„) > K/2 | F„) . (6) 

The first part of Theorem [TU] implies that the first term on the right-hand-side in © is bounded by 
1 - P{An{26,q,Q;Yn) \ Y„) + Clog(e/<5)-i for K > 2Clog{e/S)-^ whenever y„ e C„. Concerning 
the second term in ([6]), note that 

Un{S,l;Y„) < - inf i/„(J';r„) + i sup Z„(t;r„) 

<y„{t;Y„)>5 

Then the conclusion follows if we establish that 



K- 



lim limsupP sup Z„(t;F„) > if ; K„ G C„ 



Y„] =0 a.s. 
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For £ > and y„ G C„, let ii (?/„), tjn(y„)(jjn) be a maximal subset of with Pniti, tj \ ?;„) > e for 
arbitrary different indices i,j E {1, r7i(y„)}. Note that mijjn) < Ae^^ log(e/e)" by assumption 
(ii). Then condition (i) implies that 



(7) 



lim limsupP sup | ^„(ti(i/„); 1^„) | >K 



Yn = J/n =0 a.s. 



1—too \ i=l,...,m(j/„) 

On the other hand, we have on the set An{£, q, Q, Yn) the bound 

sup I sup I Zji(J^{(Yji^'^Yji 



(8) 



With e tending to zero sufficiently slowly, ((7|) and fS]) show together with the stochastic equicon- 
tinuity condition lim^x^o infn 1P( -4„((5, g, Q; Yn) K„ 1 =1 a.s. 



lim lim sup P sup Z„(t;K„) >K 



Yn^Hn] =0 a.s. 



Since the assumption inf„ supj^^y an{t;Yn) > a.s. guarantees 

lim suppft/„(y„) < -i^ |y„) = a., 

the tightness in (P^")-probabihty is proved. 



□ 



6.3. Proofs of the main results 

Proof of Theorem [2] Let A„ := m/n. In view of the Tj^nS, the behavior of the process 



conditional on Xn needs to be investigated, where A o Il\Xn is uniformly distributed on the set 
|a : Xn ^ {1/A„, -1/(1 - A„)} : ^ X{x) = 

For notational convenience it seems useful to redefine the process on the random index set 
fn {{X,,\\X,-X^\\2):l<j<n,0<k<n-l} 

via the map {j,k) i— > (Xj, \\Xj — X^ [j^) and extend it to a process (^(^j ^)) ^^g^- with T 
{{t,r) : t £ [0, 1]'', < r < max^g[Q \\x — t\\2} by the definition 



F„(t,r) := v^V^„(l- A„) / V 



dpn(a;)-dQn(x; 
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where Pjj^ and denote the empirical measures based on the permutated variables Xn(i), Xn(m) 
and Xn(„i+i), ...,Xn(„), respectively. Let 



Var(r„(i,r) 



X, 

\\t-x\\2 



rfH„(a;), 



with H„ the empirical measure of the observations ■■■,X„ 



In the sequel we make use of the results in the previous section twice - in order to prove the 
tightness and weak approximation in probability of the sequence of conditional test statistics and 
within the " loop" we use the chaining arguments again to establish a sufficiently tightened uniform 
stochastic bound for the covering numbers below. 

I. (SUBEXPONENTIAL INCREMENTS AND BERNSTEIN TYPE TAIL BEHAVIOR) The inversion of the 

conditional Bernstein type exponential inequality in Proposition [8] shows that for any > 0, 



> G'„(ry,7„(i, r)) 



Xn] < 2exp(-ry), 



where 



1/2 



with 



Rnir) 5{m,n)- — . ^ , . ^ ■ 

3 mm(A„, 1 - A„)vnr 

Let the random pseudo-metric pn on T be defined by 
p„{{t,r),{t',r')f Var(y„(t, r) - y„(i', r') | 

with tptrix) := il' i^^^^—p^) ■ Then the application of the second exponential inequality of Proposi- 
tion [5] implies for any fixed {t,r), {t',r') G T that 



''(^\Y,,{t,r)~Yr,{t',r')\ > p„((t,r), (i',r')) gr/ A'„j 



< 2 exp(-?7), 



where 



g 2 1 



9A„(1- A„) 
2max(A„, 1 - A„)2 



:iog2)-' 



s num- 



IL (Random local covering numbers) We need a bound for the local random covering 
bers N {^{uSy^^ , {{t,r) e Tn : 7„(i, r)^ < (5},p„). This is the most involved part of the proof. In 
contrast to previous work we aim at a uniform stochastic bound. In order to establish a sufficiently 
sharp upper bound, the following two claims are established: 
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(i) Let 

P2,n((t,r),(t',r'))^ := J {i>trix) - Ipt'r'ix)) dUnix) 

and define d„ for arbitrary different points in 7^ via 

dn'' := max [E pl„, 4/n] ( 1 + C log (4 e/ max [E „ , 4/n] ) j , 



witii C a positive constant to be chosen later. Note that the map x 1-^ x^l + 2C\og{\/e/x) is 
subadditive for x e (0,1], hence d„ defines a metric. Furthermore let 7„^ := E7|„ — (E7i_„)^, 
where 



Then there exist a constant C" > and a sequence (C„)„£n of measurable sets with P®"* 
Q®^"~'"\C„) —^ 1, such that for any S > 0, u € (0,1] with > 4/n and any realization 
{Xi, ...,Xn) e 

Ar((«5)V2,|(f,r) gT; :%{t,rf < s].,pn) 

< N(^{uSy^\{{t,r)efn : 72,,.(i,r)2 < c"51og(e/5)4}, d„) , 

if ^ is not rectangular. In case of the rectangular kernel, the set 

{{t,r) e fn : 72,n(i,r)' < C'6\og{e/6)^} 

in the covering number has to be replaced by 

{{t,r)G%:^2At,rf < C'^log (e/J)'} U {(t,r) e T; : 72,„(i,r)2 > 1 - C'Jlog (e/J)'}. 

(ii) There exists a constant A > 0, independent of u,6 and n, such that whenever u6 > 4/n, the 
upper bound given in (i) is again bounded from above by Au~^'^~^^^ log [e/{u6))^^'^^^\ Moreover, 
the latter bound remains valid with T in place of 7^. 

Note that we cannot rely our bound directly on uniform covering numbers and Vapnik-Cervonenkis 
(VC) theory as the envelope I{X e Af^} only allows for a bound of order which would 

result in the loss of efficiency of the procedure, and a pre-partitioning of 7^ as used in the proof of 
(ii) seems to be rather involved. 

Proof of (i): We first derive a uniform stochastic bound for the random metric p2,n- Recall that 
every function -0 of bounded total variation is rcpresentable as a difference of isotonic functions 
tp^^^ and ip^'^\ With the definition of the subgraphs 

sgr(vW) := [{x,y) e [0, 1]'' x M : y < Vt^;'(a;)}, i = 1,2, 

the set {sgr(i/'t^^) : [t, r) G T} has a VC-dimension bounded by c? + 3 (van der Vaart and Wellner 
1996) with envelope Ty(^). Consequently, the uniform covering numbers N{e,T) with 

JF := {(V'tr-V'tv')' : (i,r),(t',r')er} 
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is bounded by Ce~" for some real-valued a > and some constant C > 0. The boundedness of 
■ip shows that !F is uniform Glivenko-CantcUi in particular (sec Dudley, Ginc and Zinn 1991, for 
instance). As an immediate consequence, 



lim I 

n — ^oc 



> S 



TxT 



0, 



(9) 



for any 5 > 0. However such a bound is not sufficient for our purposes. Because of HV'llsup < 1, the 
squared random metric « 1 /'^ times the sum of n independent random variables with absolute 
values < 4, hence 



(p2,„((t,r),(i',r'))') < ^E(p2,„((t,r),(t',r'))') < max 1 1, e(p2,„ ((t, r), (t', /))') 



Var 



Now the application of Bernstein's exponential inequality (see Shorack and Wellner 1986) entails 



P2,„((t,r),(t',r'))'-Ep2,n((t,r),(t',r'))' 



max 



A/n,Ep2,4it,r),{t',r')y 



> 77 I < 2 exp 



VV2 
l + r?/3 



3 9 
< 2 exp ( — -?7 + - 



for arbitrary points (t, r), {t\ r') e T. I.e. p| „ — Epj^ni standardized by max {4/n, Ep2.„}, has (uni- 
formly) subexponential tails. Analogously, the process p| „ — Ep| ^ has subexponential increments 
with respect to the metric Z?„ given by 



llnMpl^{a)-pl^{h)) I{a^h], a,beTxT. 



bn{a,h) m 

Note that max[4/n, Ep^ „] is Lipschitz continuous with respect to Z)„. Theorem [5] shows that the 
above ingredients imply that lim^x^o infn F(.4„((5, 1,Q; <¥„) | X,^ — 1 for some adequately chosen 
Q > 0, where we use the definition of An from Theorem [TOl with y„ = Xn and Z„ = pi « ~ Ep^ n- 
Now we may apply the latter to conclude that there exists some universal constant C > such 
that the probability of the event 



P2,„((t,r), (*',/))' -Ep2,„((i,r),(i',r'))' 



> 



(10) 



C max [4/n, Ep2,„((i, r), {f , r')f] log (4 e / max [4/n, Ep2,„((i, 0, {t\ r'))' 
for some (i, r), (t', r') with Ep2,n((i, r), (t', r'))^ < 6' 



is bounded by some function £(5') independent of n with lim^'x^o e('5') = 0. Since the probability 
in ^ is antitonic in 5 for any fixed n with limes as n ^ 00 for any fixed ^, there exists a 
sequence ^„ \ along which the result of ([9]) still holds true. Thus, combining ([9]) and (flO)) for a 
sequence 5' = (5^^ \ sufficiently slowly implies the existence of a sequence of sets {An)n&i with 
p®m ® Q®("-™)(yl„) ^ 1 such that 

1/2/ \-'-/'^ 

P2,« < max [4/n, Ep2,„] ( 1 + C log (4 e/ max [4/n, Epl J) J whenever X g An- 
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The treatment of the random set 

Bs := [it,r)efn:%it,r)^ < s} 

is similar in spirit but more involved because the random quantity 7^ is not representable as a 
sum of independent variables. However we can use the decomposition [{n — l)/n]'Y'^ — 7| „ ~7i «■ 
Before deriving a stochastic bound, we notice the following: If ip describes the rectangular kernel, 
we have 7|„ = 7i^„, i.e. 

72,n-7l,„ = 72,n(l -72,n)- 

In this case, the random set Bs is consequently contained in the union 

{72%<2<^}u{72'„>l-2<^}- (11) 
Consider the general case. Using that 

Var(7i,„(t,r)) = — ^ (EV't,(XO' - (lE^t.(XO)') < -E(72,„(i, r)^) (12) 



and 



1 " 1 

Var(72.„(i,r)') = — ^ (EV^t.CX,)^ - (E7/.t,(X0')') < -E{j2,n{t,rf) , (13) 

i=l 



we may apply the above chain of arguments for n l*-" 7i,n ^^'^ 72 n together with the upper 
bounds in (jl2[) and (jl3p for the standardization respectively and obtain the existence of a constant 
Ci > such that 



Cimax[l/n,7| ]^^^ / / 2 i/2\ 

7i,n ^^^^ log [ey/n j max 72^„J J 

1 /2 

< 7i,n < 7i,n + log(^eVn/ max[l/n,72^„J j 

whenever X G I^n for some sequence (I?„)„gN with asymptotic probability 1, uniformly evaluated 
at (t,r) S 7^. Note that 7i^„ > 7|„ > 1/n for all (t, r) e 7„. The same holds true with a 
constant C2 > and a sequence (I?^)„gN with asymptotic probability 1 and 7'i^„ and 7i^„ replaced 
by 7! „ and 7|„. Using the lower bound for 7|„ and the upper bound for 71. n, a bit of algebra 
yields 

B& C |72,„-7?,„ < <5 + max[l/n,72„]^^^-^log(eV^/max[l/n,72%]^^^) 

whenever X g I?„ n I?^, (5 > Here and from now on, K denotes some universal constant, not 
dependent on n and (t, r). Its value may be different in different expressions. Now we first consider 
the case 



sup sup 

nGN (t,r)eT 



(lln/lln) < C" < 1. 



A. Rohde/ Optimal Caliration for Multiple Testing 



24 



Then the above condition shows that 



llni'^-C) < 6 +max[l/n,7y '^'^ log (eV^/ max [1/^,72 

1 1/2 K / / l/2\^l 

< 2max|(5, max [l/n,7|_„] log (^eV^i/ max [l/n, 72_„] j j, 

which entails that 7|„ < i^^log (e/S)'^ for ^ > 1/n by the isotonicity of a; > a; log(e/a:)'' on (0, 1]. 
On the other hand, the case 

sup sup (7?,„/7l,„) = 1 (14) 

implies already that ?/) is equal to the rectangular kernel: If the sup is attained it is obvious. The 
equicontinuity of (/in)neN and its uniformly bounded ii-norm || /in ||i = 1 imply its uniform bound- 
edness, hence relative compactness in the topology of uniform convergence by the Arzela-Ascoli- 
Theorem. There therefore exists at least a uniformly convergent subsequence (ft.m(n)) with (uni- 
formly) continuous limit, say /i, along this result holds true as well, because max(j (71 „/72 „) 
depends continuously on the mixed density. This however implies that describes the rectangular 
kernel, because the uniform limit h of that subsequence is bounded away from zero. Hence in case 
of (|14p . we consequently obtain by (|lip 

65 C {72^^„ < i^(51og(e/J)'*}u {72 „ > 1 - i^(5 log (e/(5)'^} whenever Xel?„nl?;, (5 > 

Proof of (a): Since ?/; is of bounded total variation, there exists some finite measure /i such that 
for any < zi < < 1, \i){zi) - "0(22)1 < ^2]- With 



M,{t,t',r,r') :-- 



\\t~x\[ 



A 



0, 



\t'-xh 



we obtain 



Jp2,„((i,r),(t',r')) < J {^trix)~yJt'r'{x)) dUnix) 
< K f fi{AL,{t,t',r,r'))dmr,{x) 



K l{yeKUt,t',r,r')}<mn{x)dijL{y) (Fubini) 



< K sup / /{ye Af,(t,t',r,r')}dH„(a;). 
ae[o,i]. 



(15) 



Then y e M^{t,t' ,r,r') implies that x G Bt{ry) ABf {r' y) . Since hn is uniformly bounded from 
above, we obtain that is not greater than KX(^Bt{r)ABt' {r')Y Consequently, dn < K d if 
dn > 4/n with the metric d defined below in (fTB]) . due to the isotonicity of a; 1-^ a;(l + Clog(e/a;)) 
for a; G (0, 1], C > 0. 4' attains its maximum 1 at 0, hence there exists some r* > such that 
V'dl^^lb) > 1/2 whenever ||a;||2 < r* . Using in addition the uniform boundedness of /i„ away from 
zero we obtain 72, n 

it, r)^ > K ■ r'^ [t, r) G T. We now start bounding the covering numbers 



N(^{u6f/', [it,r) G r : 72,„(i,r)2 < K6logie/6f}, 
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where the metric d on T x T is pointwise defined by 



d((t,r),(t',r'))' := \{Bt{r) ABt- {r')) + C \og[v e j \{Bt{r)^Br{r'))^ (16) 
with V = X{Bo{^/d)) the volume of the d-dimensional Euchdean ball with radius ^/d. Again by the 

~ 1/2 

isotonicity of X I— > a; log(e/a;) for X e (0, 1], the inequality o?((t, r), (t', r')) :— X(^Bt{r)ABti {r')^ < 
e/-\/log(y e/e^) implies that d(^{t, r), {t' ,r')) is not greater than (2C+l)^/^e. Thus in order to finish 
claim (ii), it is sufficient to bound 



N 



First note that there exists a finite collection of at most m < K/ {S log(e/(5)^) points ii, tm such 
that the set |(i,7') € T : r''- < d\og{e/S)^^ is contained in the union U™^^^ with 

A := |(t,r)er:Bi(r)ci?t,([i^'<51og(e/(5)4]i/'^)| 

for some universal K' > 0. The rotation and translation invariance of the Lebesgue measure leads 
to the rescaling invariance for the covering numbers 

iv(£i/2, {(^t,r) : B,ir) C Bo{R)}, d) - N (^{s/ R'')'/\ {{t,r) : i?,(r) C i?o(l)}, d). (18) 

But a minimal d- {e / R'^f/'^-nct of the set {(t,r) e T : Bt{r) C Bo{l),r = r'] for some fixed 
r' > e^^'^/R contains not more than M = K[R'^/e]'^ elements (ti,r'), (tM,r') with K uniformly 
in r' e (e^/'^/i?, noticing that X{Bt{r)ABt'ir)) < K\\t - t'||2r''~^ and r < Vd. Now fix a 
A'(£/i?'')-net ti, ...,tM with respect to ||.||2 and observe that X{Bt{r)ABt{r')) < Kr'^-^{r-r') for 
r > r\ r < \fd, which shows that the quantity (fT8|) is bounded by KiR!^ j eY^^ (with K uniformly 
in £ and i?). Correspondingly, this holds true for ((m(5/ log[e/(w(5)])^/^, .4^, d) , hence the covering 
number pT|) is bounded by Ab^^u^^'^^^^XogiejuSf''^'^^^'^ for some universal constant A > 0. An 
analogous bound holds for 7^ in place of T (and M(5 > 4/n): If (ti,ri), {tk,rk) denotes an e-net 
with respect to d in _B C T, we may define a 2e-net (ti, fi ),..., (t^jf^) in 7^ Hi? via the definition 
{ti,ri) :— argmin^^ ^^^^^ d((t, r), (ti, Ti)). The corresponding covering numbers in case of the 

rectangular kernel for the sets {7|„ > 1 — KSlog (e/(5)*} can be treated with similar arguments, 
which concludes the proof of (ii). 

In order to line up with the requirements of Theorem IIOI let us remark that the proof of that 
chaining requires only the special choice u = u{d) — log{e/S)"' for some exponent 7 < 0, which 
entails that 5 < n^^(logn)" for some a > in case uS < A/n. But for any a' > 0, ^{{t,r) e Tn : 
r"^ < Kn^^{logn)°' } = X^ILi tt{("'^»'^) ^ ■ r"^ < A'n~^(logn)"'}, and with the same arguments 
as used in (i) we obtain for rjj — n^^(logn)" that the inequality ]HI„(-Bt(r„)) < ArA(i3t(r„)) logn 
holds, uniformly in t G [0,1]'*, with asymptotic probability 1, which entails ji{(t, r) G 7^ : r"* < 
Arn~^(logn)" } — Op(n(logn)" ) for some a" > 0. 



III. (Tightness and weak approximation in probability) As a consequence of the above 
exponential inequalities in step I and the bound for the uniform covering numbers N(S, T), Theorem 
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[9] shows 

Xn \ =0, (19) 



lim lim sup P sup ^— — — 7;—- — — ^ > e 



where the sup within the brackets is even running over elements of T x T. Now the apphcation of 
Theorem Uni entails that £(r„ o n | A'„) is tight in (P^™ (g) Q®'"""'^)-probability. What remains 
being proved is the weak approximation. Starting from (jl9p . the uniform convergence ([9]) implies 
in particular the asymptotic stochastic equicontinuity 

for ah e > 0. 



limlimsupEf )P* sup \Y„{t,r) - Y„it' ,r') \ > e 

S\0 n^oo " V p„((t,r),(t',r'))<5 

Since to any subsequence of the metric p„ there exists some uniformly convergent subsubsequence 
as a consequence of the relative compactness of (/iTi)nGN in the uniform topology, it suffices (via 
proof of contradiction) for the weak approximation in probability 



d^^c(^{Yn{t,r))^t^r^^r -^n) , /:((2'„(t, r)) | 







to establish the convergence of finite dimensional distributions. Here, is defined via the outer 
expectations E*. For let {(ti,ri), {tk,rk)} be a collection of points from T. Denote furthermore 
artiX,) -.^n-^/^^/XjI^l^^triX,). Then 

V J(t.r)er Vfrt /(t,r)er 

with the i'th nearest-neighbor of t within Xn- Let (Z„(t, r))^^ r)eT pointwise be defined by 

Z„{t,r) := ^A„(l - A„) / (j)';!l\x) dW (x) . Using that 2cov {Xi,X2) equals Var(Xi) + Var(X2) - 
Var(Xi— X2) for two random variables Xi and X2, one finds that [(n— l)/n]cov (Yn{t, r), Yn{t\ r') \X„) 
equals 

~\ J {'^trix) - '4>t'r'{x)y dMn{x) + ^(^J (^Iptrix) ~ tjjt' r' (x)'^ dMn{x)j Xptrixf dMn{x) 

(20) 

~ ^tr{x)dmn{x)^ + \j i^t'r'{xfdm^{x) - \ { I V'tV (x)dH„ (x)' 

Replacing the empirical measure ]Ht„ by its expectation IHI„, the above six expressions in (|20p 
coincide with the covariance cov {Zn{t,r),Zn{t',r')) of the limiting process Z„. Define a^.j . '■— 
ELi ai%iX,), j = 1, k. Since 

J^max,(4;).(X,)-4:i.)^ , , 



and I cov(y„(t,r),y„(i',r') | A"™) - cov(Z„(t,r), Z„(t',r')) | — ^p»"0q®(— ™) by an application 
of the weak law of large numbers for triangular arrays to each of the expressions in (|^D|) separately, 
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Hajek's Central Limit Theorem for permutation statistics extended for the muhivariate setting 
yields the desired weak convergence in probability of the finite dimensional distributions. For 
notational convenience, define 



T^id.S') sup ||T,fe„on| - C,kn} 



(5<7„ {3,k)<S' 



and 



SniS,S) := sup I v^21og(l/7„(t,r) ) 

<5<7„(t,r)<(5' 

Since snp^^^yf^^d{t,Tn) -^p8,™^Q8,(„-™) and sup(j ^^(j- j.)>5 | Cjfe„-(2 rjfe„)i/2 
as n — > oo, it follows from the above established results that 



for any fixed 6 E (0, 1]. An application of Theorem [TUl as well as its subsequent Remark imply that 
limlimsupEP(rn(0,(5) >e| a;,) = and lim limsupP(5„(0, (5) > e) = 

for any e > 0. Thus, because obviously lim^x^o limsup^^^^ P(S'„((5, 1) < — e) = 0, we obtain 

d„(/:(rn(o,i)|A'„), /:(5„(o,i))) ^pj,„^^«<„_, o. □ 



Proof of Theorem O Let C be some compact rectangle of J. Fix /3 > 0. For any integer fc > 1 
let Cn,k C C be some maximal subset of points such that \\x — y\\2 > 2k6n and Bx{k5n) C C for 
arbitrary different points x,y G Cn,k- Then ^Cn,k ~ {kSn)~'^- Now let (f)x,n be the solution of the 
subsequent optimization problem: 

(*) Minimize II5II2 under the constraints 



G Hd(/3,L;R'*), supp(g) C B,(fc,5„), g{x) ^ 15^ / g{z)^hn{z)dz = 



These constraints define a closed and convex set in i2([0, 1]^) which is non-empty for k sufficiently 
large (and uniformly in n due to the equicontinuity of and the rescaling property, see subse- 
quently to (j24p below). Consequently in the latter case, the argmin ipx.n exists and is unique. The 
resulting density candidates 

Px,n = K ■ + {m/n))(t)xji/ and q^^n = K ■ (l - (™/»^)0x,n/ V^) 

are non-negative and thus contained in J^^™'"^ as soon as additionally 



< <Vx,n\-) < — — — for all a; e C< 



1 — m/Tl 771/71 
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This is guaranteed for sufficiently large n when sequence {5n)neN tends to zero. For any statistical 
level-a-test ip = ip{l3, L, /i„) : R'*''" —> [0, 1] for testing the hypothesis '> = 0" it holds true that 

niin E(m,„,p^ - « < min E(„,„,p^^,,^_„)i/' - IE(m,n,/i„,/t„)V' 



1 y 



(21) 



For short we write Eq for ^(rn.,n,h„,h„) in the sequel. Note that the test is allowed to depend 
on the nuisance functional /i„ (in fact the log- likelihood and its distribution do). Now we aim 
at determining Sn such that the right-hand-side tends to zero as n goes to infinity. Although 
X{snpp{(f>x^n) n supp(0j^^„)) — for any different x,y £ Cn^k, the likelihood-ratios 

are not independent. However, they are independent conditional on the random vector A„ = 
(Aa;_„)a.gCfc,„ with entries 

^x,n ■= (iii < m : \\Xi - x\\2 < kSn }, > m : \\Xi- x\\2 < k6„ }y 

Note that Eo(ia;,„|A„) — KoL^^n = 1- Following at this point standard truncation arguments as, 
for instance, in Diimbgen and Walther (2009), proof of Lemma 7.4, it turns out to be sufficient for 
the convergence to zero of (PT|) to find Sn and 7 = 7„ € (0, 1] such that the ratio 

max —}—EoLl-%^ (22) 

tends to zero as n goes to infinity. But 

l + -l{l+l){l + 0{5^,)){l-im./n)f ^xA^fdzi x (23) 



1 + ^7(1 + 7) (1 + 0{S^,)) {jninf C cl,,,Jzfdz 



using the bound (1 + A^+'f < 1 + (1 + 7)A + 2-^7(1 -I- 7)A2 + 37A2|A| for |A| < 1. Now let 
be the solution to the following optimization problem 

(**) Minimize \\g\\2 subject to 

g e HdiP, L; R^), supp(5) C Bo{k), g{0) = 1, / gix)dx = 0. (24) 
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Notice the rescaling property L5^-^g{./6n) G HdiP, L;^"^) with supp(L(5^g(./(5„)) — Bo{6nk) and 
L5!^g{0) = LS!^ ^ g e Ud{P, L; R'^) with supp(g) = Bo(fc) and .g(0) = 1. Due to the equicontinuity 

of {hn)n£N, 

Um sup sup I hn{x) — hn{z) I ~ 0, 

whence 

' ^.A^)'dz = {l + oil))L'Sl^+''\\4>k\\l (25) 

because the minimum in (*) depends continuously on the mixed density /i„ as can be seen using a 
Lagrange multipher for the centering constraint. Note that the o(l)-term is uniformly in a; G Ck,n- 
Now the combination of (pS)) and shows that for Sn sufficiently small, (|22p is bounded by 

exp(^n(m/n)(l - m/n)i7(l + 7)L25f + o(l)) - 7log(flC,,„) 

By construction, jJCfe.„ > dfe • tJjY'' for some constant dk > 0. Now fix 6 > and define 

2dL'^/P \/3/(2/3+<i) 



Cfe(/3,L) 



(2/3 + d)||0fc||i 



Observe that the sequence Cfe(/3, L) is increasing in fc. We need to check that limfc_>oo ll^felb = ||7/3||2- 
Note that in contrast to the solution of Q does not integrate to zero in general and it remains 
still open if 7^ is compactly supported for d > 2 and /3 > 1. Starting from 7^, it is sufficient to 
construct a sequence 7^jt satisfying the constraints of the optimization problem such that 
limfe^oo 117/3, fclb = \hi3\\2- Then the equality limfc^oo \\4>k\\2 = Wlfih follows from \\jp,kh > Il'^fell2- 
The existence is sketched in the appendix of the extended version of this article. As a consequence 
there exists some k' G N such that c(/3, L)(l — S) < Ck'{(3,L){l — (5/2). Now one verifies that the 
lower bound is established with the choice 

^" [ L ) 

and some sequence 7 = 7n — > with lim„ 7„(logn)^/^ — 00. □ 



Proof of Theorem [4] By virtue of Theorem [21 the sequence £(T„ o n | Xn) is tight in (P®™ (g) 
jQ^(ri ™)^_pj.Q]3^]3jjj^y^ resulting in stochastic boundedness of the sequence of random quantiles 
(KQ,(X))^gj^. The bounded total variation of the kernel for /3 < 1 is a consequence of its mono- 
tonicity, for /3 > 1 it results from the continuous differentiability of ^P/s.k and its compact support. 
For notational convenience the dependency on (3 and K is suppressed. They are arbitrary but fixed 
unless stated otherwise. First note that for any random couple fc„) it holds true that 

Hence it is sufficient to prove that for any sequence {(t>n)neti of admissible alternatives there exists 
a random sequence of (jn,fcn)nGN with — ^ — *p®"»6?o® <"-'") '^"^ proof of 
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Theorem [2] define 7„(t,r)^ := ¥rf2,n{t,r)'^ — (E7i_„(i, r))^, (t, r) e T. Let i„ := argmax^gj |0n(a;)| 



and r„ := (||0„||sup/-^) ^^'^^ Define (t„,f„) := (X-^, ^ II2 

{jn,kn) := argmin Af Bt„(r„) A (ll^j - Xfe||2 

l,...,n 

Now let the process on T pointwise be defined by 

x/A„(l- A„) ^ /||X,-<||2 



witli 



A(X,). 



Furthermore, let us introduce the random variables {tni^Tni), based on the indices (jni,fcni) which 
are defined analogously to {jn,kn) but with the minimum running over the set j, k G {1, n} \ {i} 
only. Then, recalling the definition 'iptr{x) := 



'^n (in 1 ^r? 



E 



771 



v/A„(l-A„) 1 



i=l 

n 



2 — 771+1 



< 



v/A„(l-A„) 1 



n 



i=l 

n 



i—m+l 



v/A„(l-A„) 1 



n 



1=1 



i=m+l 



< 



v/A„(l-A„) 4 maxf- " 

77 ^ 7=|jV||sup max I — , I 



^A„(l-A„) 1 



1^ Ey (^„.r,„(^) ~ V'i„r„(a;)jp„(x)dx 



(26) 



n ~ m 



i—jn+l ' 



whereby we used for the first term in the last inequality that (tni,rni) differs from (in,r„) for 
at most two indices i,j € {!,..., n}; the second term follows by including and evaluating the 
conditional expectation given {tni,rni) as Xi is independent of {tni,rni). Replacing again {tni,rni) 
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by (in,?Vi), the second expression behind the inequality in formula p6[) is bounded by 



a/A„(1 - A„) 4 / n n 
t: ^ ^ -0 sup max I 



m n — m 



Now we can make use of the fact that |p„(a;) — qn{x) 



i'^Sj^') - i^t^r^ixij (p„{x) - qn{x) 



dx 



(t)n{x)^Jhn{x) I < C\\4>n\[ 



(27) 



with 



C := sup„ supj, I \/hn{x) I . Recall that ||/i„||sup is uniformly bounded due to the equicontinuity 
assumption on {hn)neN and the constraint on the Li-norm ||/iri||i — 1, whence the term in (|27p is 
not greater than 

^ Vn\\(l)n\\sup 

Using the bounded total variation TV{^lJ) of i/j and and /x as defined in the proof of Theorem 
[21 the integral which appears in (p8)) can be bounded by 



E 



dx I . 



(28) 



E 



da; 



/{ye Mx{tn,rn,tn,rn)}dxdfi{y)j (Fubini) 
< TViij)E sup ( / l{yeM,itn,rnX,rn)}dx 



(29) 



using in the last bound besides the stochastic convergence rate n ^1'^ the uniform intcgrability of 
the sequences {n^l'^\^u~'K\\-i). (?^^^''|^^^ - ^-nl) which result from P(||?„-t„ Ha > x) = Vii=\'^[^^ ^ 
Btjx)) ^ (1 - X{Bt„ix) n [0,1]'^))" (= (1 - Vx^r if -Bt„(x) C [0, 1]'^) and P(|f„ - r„| > < 
2P(|jt„ — tn\\2 > x/2y Here, V denotes the volume of the d-dimensional Euclidean unit ball, i.e. 
V 7r''/2r(d/2 + 1). Together with ^ - ^ this shows that for any sequence of admissible 
alternatives (0n)neN 



If in particular 



We need to check that 



(^n ; Tn ) 

0[[\\ogn)/n) 

0(^(logn)('^+''/2^i)/(2'3+'')n"(2/3/d)/(2/3+d) 



+/3„-l/'i+l/2 



, the term in (j30l) is of order 



(30) 



O'n (^n 7 ) 



1. 



(31) 
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For this we use the decomposition [{n — l)/n]7„(i, r)^ = 72,n(i, ?')^ — 7i,n(^, r)'^ To this end note 
first that 

\ln,lit ni'^n') Tn,! (^715 ''"n) | 

1 ^ 

< II ^s. - II sup- E e ^„ (^") n ('^") } 

1 " 

2=1 

< II vt„?„ - ^ur„ iL„p^ i: /{x, e } 

1=1 

1 " 

+ 2||i^||,up-^/{x, e B^Jf„)ASt„(r„)} 

= Op(l)Op(rfj + Op(r^i7i-i/'*) - Op(7,.i(t„,r„)). 

The "op(l)"-terni results from the Holder continuity of i/j (for /3 > 1 the first derivative of ip 
is uniformly bounded on [—K,K]), supp('i/'t„r„ — V'J" ) = Sf^(r„) U -Bj- (r„) and the fact that 
r„ > (c(/3, L)pm.n/ L)^^^ while t„ — 1„ n^-'^/'', r„ — r„ ~ n^-'^/''. The case i = 2 is done analogously 
(taking the square). To verify (|3ip it remains to be shown that 7ri(in, fn)/7n(^n, — 1 = Oj9(l) 
which however is a simple consequence of Chebychef's inequality since for any /3 > and any 
sequence of admissible alternatives (0n)nGN, the sequence jnitn,rn) rf/"^ or some subsequence 
decreases (if it decreases) at a slower rate than n^^/^. The above considerations show in particular 
that 

^„fe„« " 3 j^0(?w, n) ^^^^ f7n(r„, f„)~^] + (5(m,n)W21og('7„(f„,f„)- 

= ^2 log (7„(t„,r„)-2^ + Op(l), 
using in addition that 5{m^n) = 1 + 0(n^^/^). Consequently, 



T-^ ~ G-^ = O 



'p{l) + (l + Op(l)) - j21og(7„(t„,r„)-2), (32) 



and it has to be verified that the latter quantity goes to infinity. Recall that 

J[0,l]<i VJ[0,1]<' 



= (l + 0(rfj) / ^Pt„rA^fh.n{x)dx. (33) 

^ ^ "'[0,1]'' 

We first assume that r„ = o(l), i.e. lli/inllsup = o(l). Using that 



lim sup sup sup I hn{x) — hn{t) I = 0, 

■5X0 n t£lO,l]'^ x£Bt{5) 
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which foUows by the same argument as used in Theorem [3] and the fact that any sequence of centers 
(in)neN has a convergent subsequence by the compactness of [0, l]'', 



^ (t r) ^ V^VUl^K)— 1/2 (1 + »(!))■ 



(34) 



Using the approximation in (j33p we obtain analogously 

1/2 

V't„r„(a;)^(ix ) 

'[OA]" 



y21og(7„(t„,r„)-2) = 2log(^1^0{l) J ^ Vt„r„ (x)^^^^ 



(35) 



Recall that "0 = '0/3, -R" with K the bound of the support. Standard calculation shows that the 
bounded L2-norm of 7^ implies 

I / 'ipt„r„;0,Kix)(t>nix)dx I I / 1pt„r„:t3{x)(l)n{x)dx I , n . , „ 

= ■ Y72~ + ^K) With ^ as A ^ 00, 

[/ V't„r„;/3,K(a:)2(ia;] [/ V't„r„;/3(a;)2dx] 

but note that the total variation TV{ijjfj,K) is increasing in K. Define now (5„ := (l + (5)c(/3, L)pm,.n- 
Then by its construction, 6niptnrn;t3 G 'Hd{f3, L;'R'^y Moreover, by the closedness in L2 and the 
convexity of the sets {(f) G 7^d(/3, L; M'') : 0(i„) > (5„} and {0 G ndil3,L;R'^) : (?!)(t„) < ~(5„}, it 
results finally from convex analysis and the definition of 7^ that 

I S ■^Wr^-A^)<t}n{x)dx I 5:^^5ni^t^r„-A\l e d/2|| || 

TTr~ - — iH n ^ ""V 7/3 2- 

Combining ([55)1 - (|35p , one verifies for the expression of the right hand side in ([5^ that it possesses 
the approximation 

m = Op{l) + VHVAn(l - A„)'5„rf/2||7^||2(1 + ck) - (j^^^"^ ^\og{n/ \ogn) 

- (2:lTd)'^Vi°g(Viog-), 

which goes to infinity for X sufficiently large. If there exists a sequence ((?I)n)neN of admissible 
alternatives such that limsup^^g^, P(m^„_p^_q^) (T„ > Ka(X)) < 1, there exists by the considera- 
tions above a subsequence (for simplicity also denoted by (n)) along which |1 (/"nil sup stays uniformly 
bounded away from zero. But the bounds ([50]) and ([5T|) show that 

E5'„ (t„, r„) — E5'ra(^„, 7-„) ^ o/'n^-^/'*+^/^') (1 + o, (1)) 

as well as the logarithmic correction term ^ ^ are in this case of smaller order than |E5„(t„, r„)|, 
which concludes the proof by contradiction. □ 
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Proof of Theorem [5] Following the considerations of the proof of Theorem [H it has to be 

established that there exist random sequences (jni, kni) ^^^^ with B^^ (W'^'J ^ '^'k 11 2) 

i = 1, fc, such that for any sequence of alternatives as formulated in Theorem [5] and any fixed 

K>0 

Iminf P(m,n,p„,g„) " '^Q®) = 1, i = l,---,k. 

Then the result follows because the finite intersection of sets with asymptotic probability equal to 
1 has asymptotically mass 1 as well. Inspired by the arguments in Rohde (2008) for the univariate 
regression context, we first establish the following: 

For (/)„ g 'Hd{l3,L; [0, 1]'') with ||0n||sup < 1 and x* = argmax^gjQ j^jd |(/)„(x)|, there exists some 
constant c — c(/3, L) > and a compact ball B — B{(f>n) C K'' with center x* such that 

X{Bn[0,l]'^) > c|0„(x*)|''/'3 and | (/)„(a;) | > ^\Mx*)\ for all x e B n [0, 1]'*- (36) 

Assume that /3 > 1 (the above inequality is trivial in case /3 < 1). With j = (ji, ...^jd) we denote 
subsequently some multi-index, where |j| = ji + .■■+jd defines its length, x^ :— Yl'^^i xj^ and :— 
91^1 /[dx'i ■ ...-dx^^] the partial differential operator. Let (j) € Tid{f3, L; [0, 1]'') with ||0||sup — D > 0. 
By the definition of the isotropic Holder class we have \(f){x) — Ty\x)\ < L\\x — yHf (< L\Q^), 
which entails that sup^^ [o.i]'' ^ D + L\f^ . In order to establish note that for any 

polynomial P = X]|j"|<[/3j ctj^;-', the topology induced by the metrics corresponding to the two 
norms ||-P||(i) = sup^.g^g,!]'' 1^(2^)1 and ||-P||(2) maxj \aj\ respectively on the ring of polynomials 
of total degree at most [/3J on [0, 1]'' is the topology of uniform convergence, hence these two norms 
are equivalent. Consequently, the boundedness of the polynomial T^'' by I? + LVd uniformly 
in y implies that there exists some constant C ~ C{j3) such that ||Z?^(/)||sup < C[D + L) for all 
multi-indices j with | j | < [/3J . Now the Mean Value Theorem implies for some intermediate point 
z e {x + t{x* - x);0 <t <l} 

\(t){x)-(f>{x*)\ = I (V0(z))'^(x-a;*) I 

< Vd sup \\D^^^^J\X^X*\\2 



< VdCiDA 



\\x — X 



Thus, 



|0(x)| > -\Mx*)\ for aU a; in f — | n [0. ll"^. 

If € Hd{f3, L; [0, 1]'') with llf/illsup ^ 5 < 1, then the function gs, for x £ [0, 1]'^ pointwise defined 
by gsix) S-^(t){d^/'^x + x*) ■ l{d'^/'^x + x* e [0,1]'^} is element of HdiP, L;snpp{gs)) with 
llff^llsup ~ 1- Note that supp(.g5) is a convex set. Therefore, the above considerations imply that 
|(/)(a;)| >6/2 on 

B^A n [0,1]'*. 

\2^/dC{l + L)) ^ ^ 

But then its Lebesgue measure is always greater than c|(5|'*/'^ for some constant c = c{P,L), 
independent of S and x* , hence ([BG)) is established. 
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Let now (3i,Li £ (0,oo) fixed but arbitrary, Ji C [0,1]'' some nondegenerate rectangle, ^„ = 
Pn — <?n a sequence of functions with G Ti.d{/3i, Li; Jij . It has to be shown that there exists 

a universal constant ki — ki{f3i, Li,c) such that — ^ ^pe>m^q(sin-m) oo whenever 

||</'n||ji > kiPm,n- First, we choose a compact ball Bi((p„) with center x* := argmax^^j. 
satisfying A(B,(<>„) n Ji) > c|^„«)|'^/^ and Let the couple (f„,f„) := (X-- ,\\X-- - II2) 

Jti Jn f^n 

be defined by 

{jm kn) 



= argmin A Bx, - ^fclb A B»(0„) 

i,fc6{l,...,n} \ ^ 

Consulting the proof of Theorem [H this definition of {tn,rn) allows for an approximation as in 
(EH). Since \4>nix)\ > 2-i||<^„|| J. for all x e B^n) H B^ (f„) n J„ 



ES'„(t„,r„) 1 II 7 II r- ^ -^n) 
77 ^ > :7||'?>«||j.V» , , = 

Af3+d/2)/l3 



EA(B,(0„)nB^Jf„)n[o,i]'^ 



1/2 



for some universal constant C — C[K, (A„)„gN) > 0. Now the asserted result is easily deduced for 
a sufficiently large constant ki. □ 

Proof of Theorem [HI (i) Let {p, q, m, n) be such that h = /i„ = ^[o,!]"* 'Pn the sequence of 
piecewise constant functions on [0, 1]'' with (/)„(z) = Cn/ \/nSf^ for z € Bx{Sn), (l>n{z) = —Cn/\/n5^ 
for z e Bx{K,5n) \ Bx{5n) and equals zero otherwise, where k — K{d) > 1 is such that \{^Bx{n5n) \ 
BM) = A(Bx('5„)) and < c„ < ^/^^. Then 



log 



(m.n.pmqn ) 



(X)) = ^log(l + (l-m/n)^„(X,)) + ^ log(l-(TO/n)0„(X,) 



j—m-'rl 



with (X/c)a;^n iid uniformly distributed on [0, 1]^. Note that 



■'{rn.njijh) 



log 



dF 



{■m,n,pn,qn) 
{m,n,h.h) 



(X) 



^iog(i + (i-wn)-^i?.) + E 



with (_Rfe)fcgN and {R'i,)k£fii two independent sequences of iid Rademacher variables, Nm and Nn-m 
independent with 

iV™ - Bin(m, ^^''(5^) , iV„^™ ~ Bin(n - m, and V = 7r'^/2r(d/2 + l) . 

Suppose first that nSf^ -/^ oo. By extracting a subsequence if necessary we may assume that 



m/n ^ A e (0,1), c„/\/n(5,^ ^ c G [0,1] and V 7. Then, with denoting weak 

convergence, 



log 



■ {m,n,pn,qn) 



(X) 



(37) 
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with the convolution 

Coo / ^ \ \ / ^ / ^' 

J2Py(i-x){k)c(j2^og (l - Aci?,) ) U E P7A(fc')>c( ^log (l + (1 - A)ci?; 
k=0 ^ i=l ^/ \k'=0 ^i=l 

and the Poisson weights Pf_i{k) :— e~^/i'^/fc!. Since / e^(iQ(z) = 1, we can apply Le Cam's notion 
of contiguity (Le Cam and Yang 2000, chapter 3) to conclude that 

limsup E(m,„,p„,,„)V'«(X) < 1- 

n — >oo 

Consequently nJ^ oo . Now assume that nJ^ —> oo but c„ 7^ oo . Without loss of generality we 
may assume that c„ — > c' /VVk^ S [0, 00). Then Lindeberg's CLT entails that ([57]) holds true with 



2 / V 2 

Again, the limiting distribution satisfies J e^dQ{z) = 1, whence c„ ^ 00. 

(ii) We begin as in the proof of TheoremlH but with i„ := x, r„ :~ Sn and H^n Hsup ■— Cn/ \/nS^. 
Adjusting ^ - yields 



E{Sn(Xn,Sn) - S'„(a;,(5„)) 



= 0(<5-in-i/'^c„) (1 + 0,(1)). 



The arguments of the proof of Theorem [4l apply again and lead to the expansion 

+ ^^[^(1 + - y21og(7„(:.,<5„)-2)^ (38) 

while with the same reasoning as in the proof of Theorem [5] 



for some constant C = C(c?, (A„)gN) > and y^2 log (7„(x, (5„) 2) = 0(l)^log(l/(5„). Thus, if 
•\/log(l/(5„)/c„ ^ and n(5^ ^ 00, ([55)1 goes to infinity and the result follows. □ 

7. Appendix 

We start with a basic but useful property of the solution to ([3]). 

Lemma 11. If the solution to Q) is not of bounded support, its lower isotonic and upper antitonic 
envelopes are vanishing in +00. 
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Proof Suppose tpfj has only finitely many extremal points. From the last extremal point on the 
function ip/^ is monotoneous and the integral over |.|'^^^?/'^(.)^ can only be finite if both envelopes 
are vanishing in +00. Now consider the case of infinitely many extremal points. Since the L2- 
norm of the solution Q is finite and if there exists a sequence of local extrema of ^pp which stays 
uniformly bounded away from zero, their width must be bounded by a zero sequence. But now the 
result follows via contradiction of which, of course, is also applicable for local extrema. □ 

Let e > be fixed. Define to be a positive real number such that the following conditions 
are satisfied: is a local extremal point, /g^^j ^jf3{x)^d{x) > (1 — £/2)||7/3||2, liV'/sll [4^,00) < £/2 
(doable by Lemma fTTjl . Now extend the function ippl{- < t^} to a compactly supported function 
Ge such that G£(||.||2) G H(j(/3, 1;M), J G^(^\\x\\2)dx = and J^d\Bo{t ) ^^(ll^lla)^^^^; smaller than 
^11 7/3 II 2! this is possible for sufficiently large (because the uniform boundedness from yields 
the boundedness of all partial derivatives by a multiple of e with the same argument as used in 
the proof of Theorem [5l so one may extend the function first to a compactly supported one in 
7id(/3, L;R) and then extend it close to zero such that its integral vanishes) - we omit an explicit 
construction at this point. With e sufficiently small, this construction leads to what was required 
in the proof of Theorem [3] 
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